.\" Manpage for megacc
.\" Contact glen.stecher@temple.edu to correct errors or typos

.TH megacc 1 "27 November 2019" "10.1.6" "MEGA Computational Core"

.SH NAME
megacc \- MEGACC (Molecular Evolutionary Genetics Analysis Computational Core)
.SH SYNOPSIS
.B megacc 
\fB\-a \fIMAO FILE \fB\-d \fIDATA FILE [\fB\-t \fITREE FILE\fR][\fIOPTIONS\fR]
.SH DESCRIPTION
megacc (molecular evolutionary genetics analysis computational core) is an integrated suite of tools for statistics-based comparative analysis of molecular sequence data based on evolutionary principles (Tamura et al. 2011, Kumar et al. 2012). MEGA is used by biologists in a large number of laboratories for reconstructing the evolutionary histories of species and inferring the extent and nature of selective forces shaping the evolution of genes and species. Comprehensive documentation describing the analyses and methods used in MEGA is available at \fIhttp://www.megasoftware.net/webhelp/helpfile.htm\fR. Additionally, a quick-start tutorial has been copied to /usr/share/megacc along with example data files and a chm file containing the help documentation for the GUI version of MEGA. This chm file, although written for the GUI version of MEGA is a valuable reference for megacc users.
.SH OPTIONS
.IP -a 
--analysisOptions 
.nf
\fIMEGA Analysis Options File (MAO)\fR     *required*
.fi
Specify the full path to the Mega Analysis Options (.mao) file.  
This file tells MEGA-CC which analysis to perform as well as which options to use. 
This file is created using the analysis prototyper (megaproto) application that is installed with megacc.
Linux users can launch the prototyper using the megaproto command. Mac users can launch it from the Applications folder.

.IP -c 
--calibration 
.nf
\fICalibration file\fR  *optional*
.fi
Specify the full path to a calibration file that you wish to use. 
The calibration file is used to provide calibration data for tree timing methods (see \fICALIBRATION FORMAT\fR below).

.IP -d 
--data
.nf 
\fIData File\fR *required*
.fi
Specify the full  or relative path to the data file you wish to analyze.  
MEGA (.meg), and Fasta sequence data files are supported for all analyses. 
For distance matrices the MEGA (.meg) format is required.

.IP -f 
--format 
.nf
\fIExport format for sequence alignment\fR *applies to sequence alignment only*
.fi
Sequence alignments can be exported in either the native .meg or FASTA format.
Format values:
MEGA
Fasta

.IP -g 
--groups 
.nf
\fIGroups file\fR
.fi
Specify the full path to the groups file that you wish to use.
This file organizes taxa into groups where each line in the file is a key value pair of the form 
     taxonName=groupName 
Group information is used for certain analyses, for instance, specifying which taxon/taxa
comprise the outgroup for the timetree analysis.

.IP -h 
--help 
.nf
\fIHelp\fR
.fi
Prints this help file document.

.IP -l 
--list	
.nf
\fIInput File List\fR	*optional*
.fi
Specifies a text file which has a list of input data files to be analyzed. 
This option can be used instead of -d or -t to specify input data, in which case, the same
analysis will be performed on all input files listed in the text file and each
output results files will be named using the name of its corresponding input file.
The indicated text file must be formatted such that each line has the full path to
the sequence data file to be used and if a tree file is also provided it is on the same line
but separated by a two pipe characters (e.g. || ). 
See EXAMPLES and LIST FORMAT below.

.IP -n 
--noSummary	
.nf
\fIDo not write out the analysis summary file.\fR *optional*
.fi
By default a file that gives an analysis summary is written.
This option suppresses the export of that file. 
However, if any important messages are generated by the application, they
will be written to this file regardless.

.IP -o 
--outfile     
.nf
\fIOutput Path / Output Dir\fR *optional*
.fi
Specify the full path and base filename (e.g. /myResultsDirectory/myResultName) or
simply the full path and directory of where to save the file
(e.g. /myResultsDirectory) in which case, a unique filename will be chosen
automatically for you.
If no output path is specified, results will be saved in the same directory
as the input data file, with a unique name.

.IP -pfc 
--partition-frequency-cutoff
.nf
\fIPartition Frequency Cutoff (a value between 0.0 and 1.0 - default is 0.5)\fR *optional*
.fi
When bootstrapping is used for tree construction a list of partitions and
frequencies is written to a text file. The partition frequency cutoff
causes any partitions whose frequency is less than the cutoff value
to be ommited from this text file. Set this value to 0.0 to include
all partitions.

.IP -r 
--recursive
.nf	
\fIRecursive directory search\fR *optional*
.fi
If a directory is specified for analysis by default MEGA only searches
the contents of that folder and not any of it's children.  
To include the contents of all folders under the one specified, use this option.

.IP -s 
--silent 	
.nf
\fIDo not write out the progress updates\fR *optional*
.fi
This option prevents progress updates from being written to stdout. 
For long running calculations, this option may produce a modest speedup.

.IP -t 
--tree     
.nf
\fITree File\fR *required for some analyses*
.fi
Specify the full path to the tree file you wish to use. This file must be formatted
according to the Newick file format. Some analyses requires a user provided tree, or optionally allow you to provide 
your own. See
.nf
.RS
.RS
\fIhttp://evolution.genetics.washington.edu/phylip/newicktree.html\fR).
.RE
for a description of the Newick format.
.RE
.fi



.SH EXAMPLES
.PP
This example performs a multiple sequence alignment on codons (it assumes that you have created the file 'Clustal_Codon_Alignment.mao' using the prototyper (megaproto). A fasta file with coding data is used as input and the resulting alignment is output in the MEGA format:

.nf
.RS
megacc -a ~/Documents/Clustal_Codon_Alignment.mao -d ~/Documents/codingData.fas -o ~/Documents/codingDataAligned.meg
.RE
.fi
.PP
This example shows how to construct a neighbor-joining phylogeny for each of a list of sequence data files. 
The analysis will be performed for each file listed in 'listOfDataFiles.txt' and all results will be written to the ~/Documents/outputDirectory/ directory:

.nf
.RS
megacc -a ~/Documents/NJ_Tree_Settings.mao -l ~/Documents/listOfDataFiles.txt -o ~/Documents/outputDirectory/
.RE
.fi
.PP
.SH LIST FORMAT
When using the -l option, each file to be analyzed must be on its own line. For example:

.nf
.RS
~/Documents/myData/seqData1.fas	
~/Documents/myData/seqData2.fas
~/Documents/myData/seqData3.fas
.fi
.RE

If the analyses are to use a user-provided Newick tree file, then the tree files are given on the same line as the data files, following two pipe characters. For example:
.PP
.nf
.RS
~/Documents/myData/seqData1.fas || ~/Documents/myData/treeFile1.nwk
~/Documents/myData/seqData2.fas || ~/Documents/myData/treeFile2.nwk
~/Documents/myData/seqData3.fas || ~/Documents/myData/treeFile3.nwk	
.fi
.RE
.PP
.SH CALIBRATION FORMAT
For the timetree analysis, multiple calibration points can be used to convert relative times to absolute times. Calibrations are given in a text
file which is specified using the -c option and is formatted using one of three formats. In the first format, the node in the tree for a given calibration
is specified by its name, which must be included as an internal node label in the newick tree file used:

.RS
.nf
!NodeName1='name1' minTime=1.75 maxTime=2.25;
!NodeName2='name2' minTime=3.0;
!NodeName3='name3' time=2.5;
.fi
.RE

In the second format, the node in the tree for a given calibration point is indicated by specifying two taxa whose most recent common ancestor is the node
to calibrate:

.RS
.nf
!MRCA='some name1' TaxonA='taxon1 name' TaxonB='taxon2 name' minTime=1.75 maxTime=2.25;
!MRCA='some name2' TaxonA='taxon3 name' TaxonB='taxon4 name' minTime=3.0;
!MRCA='some name3' TaxonA='taxon5 name' TaxonB='taxon6 name' time=2.5;
.fi
.RE

In the third format, target nodes are specified in the same way as the above examples but
probability distributions are specifed for divergence time constraints instead of min and max times:

.RS
.nf
!NodeName1='name1' Distribution=normal mean=6.4 stddev=1.2;
!MRCA='some name2' TaxonA='taxon1 name' TaxonB='taxon2 name' Distribution=exponential time=8.2 decay=0.25;
!MRCA='some name3' TaxonA='taxon1 name' TaxonB='taxon2 name' Distribution=uniform mintime=4 maxtime=6;
!MRCA='some name4' TaxonA='taxon1 name' TaxonB='taxon2 name' Distribution=lognormal offset=7.0 mean=2.38 stddev=0.15;
.fi
.RE

.SH CITING MEGACC
.PP
.RS
Kumar S, Stecher G, Peterson D, and Tamura K (2012) \fBMEGA-CC: Computing Core of Molecular Evolutionary Genetics Analysis Program for Automated and Iterative Data Analysis\fR. \fIBioinformatics\fR 28:2685-2686.
.RE
.RS
Kumar S, Stecher G, Li M, Knyaz C, and Tamura K (2018) \fBMEGAX: Molecular Evolutionary Genetics Analysis across computing platforms\fR. \fIMolecular Biology and Evolution 35:1547-1549\fR.
.RE
.SH BUGS
No known bugs.
.SH AUTHOR
Sudhir Kumar, Glen Stecher, and Koichiro Tamura
.SH COPYRIGHT
copyright 2011-2019 by the authors                                         

